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Abstract. — A Wasserstein space is a metric space of sufiiciently concentrated 
probability measures over a general metric space. The main goal of this paper 
is to estimate the largeness of Wasserstein spaces, in a sense to be made precise. 

In a first part, we generalize the Hausdorff dimension by defining a family 
of bi-Lipschitz invariants, called critical parameters, that measure largeness 
for infinite-dimensional metric spaces. Basic properties of these invariants are 
given, and they are estimated for a naturel set of spaces generalizing the usual 
Hilbert cube. These invariants are very similar to concepts initiated by Rogers, 
but our variant is specifically suited to tackle Lipschitz comparison. 

In a second part, we estimate the value of these new invariants in the case of 
some Wasserstein spaces, as well as the dynamical complexity of push-forward 
maps. The lower bounds rely on several embedding results; for example we 
provide uniform bi-Lipschitz embeddings of all powers of any space inside its 
Wasserstein space and we prove that the Wasserstein space of a d-manifold 
has "power-exponential" critical parameter equal to d. These arguments are 
very easily adapted to study the space of closed subsets of a compact metric 
space, partly generalizing results of Boardman, Goodey and McClure. 



1. Introduction 

This article is motivated by the geometric study of Wasserstein spaces; these 
are spaces of probability measures over a metric space, which are often infinite- 
dimensional for any sensible definition of dimension (in particular Hausdorff 
dimension). This statement seemed to deserve to be made quantitative, and 
very few relevant invariants seemed available. We shall therefore develop such 
tools in a first part, then apply them to Wasserstein spaces via embedding 
results in a second part. 
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1.1. A generalization of Hausdorff dimension: critical parameters. 

— The construction of Hausdorff dimension relies on a family of functions, 
namely (r i— t- r^)s, and one can wonder what happens when this family is 
replaced by another one. This is exactly what we do: we give conditions on 
a family of functions (then called a scale) ensuring that a family of measures 
obtained by the so-called Caratheodory construction from these functions be- 
have more or less like Hausdorff measures do. In particular these criterions 
ensure the existence of a critical parameter that plays the role of Hausdorff 
dimension, and the Lipschitz invariance of this parameter. It follows that 
any bi-Lipschitz embedding of a space into another implies an inequality be- 
tween their critical parameters. We shall use three main scales relevant for 
increasingly large spaces: the polynomial scale, which defines the Hausdorff 
dimension; the intermediate scale and the power- exponential scale. We shall 
say for example that a space has intermediate size if it has a non-extremal 
critical parameter in the intermediate scale, which implies that it has infinite 
Hausdorff dimension and minimal critical parameter in the power-exponential 
scale. 

This line of ideas is far from being new: Rogers' book |Rog70| shows that 
this kind of constructions were well understood forty years ago. Several works 
have considered infinite-dimensional metric spaces, mostly the set of closed 
subsets of the interval, and determined for some functions whether they lead 
to zero or infinite measures; see in particular |Boa73|, IGoo77|, IMcC97^ . 

Concerning the definition of critical parameters, our main point is to stress 
conditions ensuring their bi-Lipschitz invariance. But the real contribution of 
this paper lies in the computation of critical parameters for a variety of spaces, 
partly generalizing the above papers. 

Hausdorff dimension is easy to interpret because the Eulidean spaces can 
be used for size comparison. There is a natural family of spaces that can play 
the same role for some families of critical parameter: Hilbert cubes. Given an 
i'^ sequence of positive real numbers a = (a„)„gN (the classical choice being 
an = ^/n), let HC(/; a) be the set of all sequences u such that ^ u„ ^ a„ for 
all n, and endow it with the i'^ metric. Here / stands for the unit interval, and 
the construction generalizes to any compact metric space X: the (generalized) 
Hilbert cube HC(X;a) is the set of sequences x = (x„) G endowed with 
the metric 



The main results of the first part are estimations of the critical parameters 
of generalized Hilbert cubes. In particular, we prove that under positive and 
finite dimensionality hypotheses, HC(X, a) has intermediate size if a decays 
exponentially, and has power-exponential size if a decays polynomially. 
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To illustrate this, let us give a consequence of our estimations. 

Corollary 1.1. — LetX,Y he any two compact metric spaces, assume X has 
positive HausdorjJ dimension and Y has finite upper Minkowski dimension, 
and consider two exponents a < P £ (1/2, +oo). 

Then there is no bi-Lipschitz embedding HC(X; (n~°)) ^ HC(y; (n~^)). 

This non-embedding result, as well as a similar result described below, is 
different in nature to the celebrated results of Bourgain |Bou86j (a regular 
tree admits no bi-Lipschitz embedding into a Hilbert space), Pansu |Pan89j 
and Cheeger and Kleiner [CKlOj (the Heisenberg group admits no bi-Lipschitz 
into a finite-dimensional Banach space nor into L^). These results involve the 
fine structure of metric spaces, while our approach is much cruder: all our 
non-embedding results come from one space being simply too big to fit into 
another. 

Our methods are similar to those used in Hausdorff dimension theory: we 
rely on Frostman's Lemma, which says that in order to bound from below the 
critical parameter it is sufficient to exhibit a measure whose local behavior 
is controlled by one of the scale functions, and on an analogue of Minkowski 
dimension, which gives upper bounds. 

This analogue might be considered the most straightforward manner to 
measure the largeness of a compact space: it simply encodes the asymptotics 
of the minimal size of an e-covering when e go to zero. However, the Minkowski 
dimension has some undesirable behavior, notably with respect to countable 
unions; this already makes Hausdorff dimension more satisfactory, and the 
same argument applies in favor of our critical parameters. 

Whatever scale is used, the construction of critical parameters relies on the 
existence for all e > of at least one covering of the space by a sequence of 
parts En whose diameter is at most s and goes to when n goes to oo. This 
property has been studied under the names "small ball property" and "largest 
Hausdorff dimension", see the works of Goodey |Goo70j . Bandt [BanSl] and 
Behrends and Kadets [BKOlj . In particular, it is proved in [GooTOj and 
|BK01] that the unit ball of an infinite-dimensional Banach space never has 
the small ball property. As a consequence, our critical parameters cannot be 
used to measure the largeness of Banach spaces, apart from the obvious rela- 
tion between Hausdorff dimension and linear dimension of finite-dimensional 
Banach spaces. 

1.2. Largeness of Wasserstein spaces. — The second part of this article 
is part of a series, partly joint with Jerome Bertrand, in which we study some 
intrinsic geometric properties of the Wasserstein spaces 'Mp{X) of a metric 
space {X, d) . These spaces of measures are in some sense geometric measure 



4 



BENOIT KLOECKNER 



theory versions of spaces (see Section [5] for precise definitions) . Here we 
evaluate the largeness of Wasserstein spaces, mostly via embedding results. 

Other authors have worked on related topics, for example Lott |Lot08j . who 
computed the curvature of Wasserstein spaces over manifolds (see also Takatsu 
|Tak08j ). and Takatsu and Yokota |TY09j who studied the case when X is 
a metric cone. 

Several embedding and non-embedding results are proved in previous arti- 
cles for special classes of spaces X, in the most important case p = 2. On the 
first hand, it is easy to see that if X contains a complete geodesic (that is, an 
isometric embedding of M), then W2{X) contains isometric embeddings of open 
Euclidean cone of arbitrary dimension jKlolOai] . In particular it contains iso- 
metric embeddings of Euclidean balls of arbitrary dimension and radius, and 
bi-Lipschitz embeddings of R*' for all k. On the other hand, if X is nega- 
tively curved and simply connected, W2{X) does not contain any isometric 
embedding of [BKlOj . 

1.2.1. Embedding powers. — First we describe a bi-Lipschitz embedding of 
X^ . This power set can be endowed with several equivalent metrics, for ex- 
ample 

dp{x = (xi,...,Xfc), y = (yi,...,yfc)) = ^d{xi,yiY 



and 



doo{x,y) = max d{xi,yi) 



which come out naturally in the proof; moreover doo is well-suited to the 
dynamical application below. 

Theorem 1.2. — Let X be any metric space, p S [l,oo) and k be any positive 
integer. There exists a map f : X'^ — )• Wp{X) such that for all x, y G X'^ : 

1 , , / 2*''~i \p 



— dp{x, y) ^ Wp{f{x), f{y)) ^ -r dp{x, y) 



k{2k-l)p V2^-l. 

and that intertwines dynamical systems in the following sense: given any mea- 
surable self-map ip of X, denoting by ipj^ the induced map on X^ and by 
the induced map on measures, it holds 

f °fk = 'f#° /• 

Note that since doo ^ dp ^ k^doo similar bounds hold with doo; in fact the 
lower bound that comes from the proof is in terms of doo and is slightly better: 

1 



^_XT^ —doo{x,y) ^ Wp(/(x),/(y)). 



k' p{2^-l)v 
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This result is proved in Section [6j 

We shall see in Section [6.21 that the constants cannot be improved much for 
general spaces, but that for some specific spaces, a bi-Lipschitz map with a 
lower bound polynomial in k can be constructed. This map however does not 
enjoy the intertwining property. 

The explicit constants in Theorem 11.21 can be used to get information on 
largeness in the Minkowski sense only, since critical parameters are designed 
not to grow under countable unions. Let us give a more dynamical application 
that uses the intertwining property in a crucial way. 

Corollary 1.3. — If X is compact and ip : X ^ X is a continuous map 
with positive topological entropy, then ip^ has positive metric mean dimension. 
More precisely 



Metric mean dimension is a metric invariant of dynamical systems that re- 
fines entropy for infinite-entropy ones, introduced by Lindenstrauss and Weiss 
jLWOOj in link with mean dimension, a topological invariant. The definition 
of mdimM is recalled in Section 16. 3i 

Note that the constant in Corollary 11.31 is not optimal in the case of multi- 
plicative maps xd acting on the circle: in |KlolOb] we prove the lower bound 
p{d — 1) (instead of plog2 d here). 

It is a natural question to ask whether the (topological) mean dimension of 
ip^ is positive as soon as ip has positive entropy. To determine this at least 
for some map (p would be interesting. 

1.2.2. Embedding Hilbert cubes. — Since embedding powers cannot be enough 
to estimate critical parameters, we shall embed Hilbert cubes in Wasserstein 
spaces. From now on, we restrict to quadratic cost (similar results probably 
hold for other exponents, up to replacing Hilbert cubes by analogues). 

Theorem 1.4- — Given any A E (0,1/3) and any compact metric space X, 
there is a continuous map g : HC(X, (A")) — )• W2{X) that is sub-Lipschitz: for 
some C > 0, 



The embedding we construct here is not bi-Lipschitz, but this does not 
matter to get lower bounds on critical parameters. 

For rectifiable enough spaces, we can use the self-similarity of the Euclidean 
space to get a much stronger statement. 

Theorem 1.5. — Let X be any Polish metric space that admits a bi-Lipschitz 
embedding of a Euclidean cube I'^ (e.g. any manifold of dimension d), and let 



mdimAf(v9#, Wp) ^ p 



htop(v?) 
log 2 



W2(5(S),9(y)) ^ 



d{x,y) 
C 
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(ttn) be any ^^+2 sequence of positive numbers. Then there is a bi-Lipschitz 
embedding 0/ HC(/'^, (a„)) into WziX). 

The embedding theorems 11.41 and 11.51 have consequences in terms of critical 
parameters (defined precisely in part [I]) . 

Proposition 1.6. — If X is any compact metric space of positive Hausdorff 
dimension, then W2{X) has at least intermediate size, and more precisely 

crit^ W2{X) ^ 2, crit,/2 W2{X) ^ 

This estimate is very far from being sharp for many spaces, but it has the 
advantage to be completely general. 

The second embedding result gives a much more precise statement when X 
is sufficiently regular. 

Theorem 1.7. — If X is a compact d-dimensional manifold (or any com- 
pact space having upper- Minkowski dimension d and admitting a bi-Lipschitz 
embedding of I'^), then ^(X) has power- exponential size, and more precisely 

crit,^ W2{X) = d. 

The upper and lower bound are proved independently under partial hy- 
potheses, see Propositions 17.31 and I7.4i A direct consequence of Theorem 11.71 
is that if X, X' are d, d'-dimensional manifolds with d > d', then there exists 
no bi-Lipschitz embedding from W2{X) to W2{X'). 

A surprise about the proof is that the methods for the upper and the lower 
bound are very different and can both seem quite rough (see the proofs in 
section I7.3p . but they nevertheless give the same order of magnitude. The 
fact that the power-exponential critical parameter of the Wasserstein space 
coincide with the dimension of the original space in the case of manifolds is 
an indication that the power-exponential scale is relevant. 

It is an open problem to find a relevant "uniform" probability measure 
on W2{X) (see [vRS09j ). Knowing the critical parameter of a space, the 
Caratheodory construction provides a Hausdorff-like measure, which unfortu- 
nately need not be finite positive. One could hope to find a function such that 
the Caratheodory construction leads to a finite positive measure, which would 
then be a natural candidate to uniformity, in particular because the construc- 
tion depends only on the geometry of the space. Our result, while far from 
answering the question, at least gives an idea of the infinitesimal behavior of 
any such candidate: the desired function should be very roughly of the order 
of magnitude of r 1— )• exp(— (1/r)'^) when X is a d-manifold. However, it is 
unlikely that the Caratheodory construction can be used to produce such a 
measure. In the quite similar case of the space of closed subset of the interval, 
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endowed with the Hausdorff metric, it has indeed been proved |Boa73j that 
no function yields a Hausdorff hke measure that is both positive and cr-finite. 
It would be interesting to determine whether this result holds in the case of 
Wasserstein spaces. 

1.3. Largeness of closed subset spaces. — The same methods used on 
Wasserstein spaces can also be used to study the space of closed subsets of a 
compact metric space. We shall end the paper in section [8] with the proof of 
the following result. 

Theorem 1.8. — Let X be a compact d-manifold (or any compact space hav- 
ing upper- Minkowski dimension d and admitting a bi-Lipschitz embedding of 
I'^). Then the space '^{X) of closed subsets of X, endowed with the Hausdorff 
metric, has power-exponential size and more precisely 

crit^'^(X) = d. 

This result should be compared with those of Boardman |Boa73j and 
Goodey jGoo77j . which together give a refinement of Theorem 11.81 when 
X = [0,1], and of McClure |McC97] which applies to self-similar subsets 
of Euclidean space that satisfy a strong separation property. 

Acknowledgements. — I warmly thank Antoine Gournay for a very interest- 
ing discussion and for introducing me to metric mean dimension, Greg Kuper- 
berg who suggested me that Hausdorff dimension could be generalized, and 
an anonymous referee for his numerous comments that greatly improved the 
paper. 



PART I 

A GENERALIZATION OF HAUSDORFF DIMENSION: 
CRITICAL PARAMETERS 



2. Caratheodory's construction and scales 

In this section we consider metric spaces X, Y (assumed to be Polish, that 
is complete and separable, to avoid any measurability issue) and we use the 
letters A, B to denote subsets of X. 

2.1. Caratheodory's construction of measures. — The starting point 
of our invariant is a classical construction due to Caratheodory (see |Mat95j 
for references and proofs) that we quickly review. The idea is to count the 
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number of elements in coverings of A by small sets Ei, weighting each set by 
a function of its diameter. 

Let / : [0, r) — )■ [0, +oo) be a continuous non-decreasing function such that 
/(O) = 0. Given a subset A oi X, one defines a Borel measure by 



A/(^) = lim inf I ^ /{diamEi 



. i=l 



A C UEi, diamE'j ^ 6, Ei closed 



where the limit exists since the infimum is monotone. If f{x) = , Af is the 
s-dimensional Hausdorff measure (up to normalization) . 

We shall say that (Ei) is a closed covering of A if it is a covering by closed 
elements, and a (5-covering if all Ei have diameter at most S. 

2.2. Scales and critical parameters. — We shall perform Caratheodory's 
construction for a family of functions, and we need some conditions to ensure 
that a sharp phase transition occurs. 

Definition 2.1. — A scale is a family ^ of continuous non-decreasing func- 
tions fs ■ [0,Ts) — >• [0,-|-oo) such that /s(0) = 0, where the parameter s runs 
over an interval / C M, and which satisfies the following separation property: 

Vt > s G /, VC ^ 1, ftiCr) = or^oifsir)). 

The following families are the scales we shall use below. The polynomial 
scale (or dimensional scale) is 

^■={r^ r^),g(o,+oo) 

and its critical parameter (to be defined below) is Hausdorff dimension. The 
intermediate scales (or power-log-exponential scales) are divided into a coarse 
scale 

^ := (^r ^ e-0°s^)') 
and, for each a £ [1, -|-oo) a fine scale 



sG[1,+oo) 



se(o,+oo) 

note that J^i = The power- exponential scale is 

r I— > e 



sG(0,+oo) 

The parameter s = 1 corresponds to exponential size; while one could consider 
giving a more precise scale in this case, the family (r i— )■ exp(— s/r))^ does not 
define one: it does not satisfy the separation property, and would lead to a 
critical parameter that is not bi-Lipschitz invariant. 
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Consider a scale ^ = {fs)s£i and a subset A of X. We have, like in the case 
of Hausdorff measures and with the same proof (using the separation property 
only with C = 1): 

Lemma 2.2. — For all parameters t > s £ I, if Af^{A) > then Af^{A) = 

+00. 

This leads to the equalities in the following. 

Definition 2.3. — The critical parameter of A with respect to the scale ^ 
is the number 

crit,^ A := sup{s G I\Kf^{A) = +00} 
= sup{sG/|A/,(^) >0} 
= inf{sG/|A/,(^)=0} 
= inf{s E / I A/^(^) < +00} 

Note that the critical parameter belongs to the closure of / in M. 

2.3. Basic properties of the critical parameter. — The critical pa- 
rameter defined by any scale shares many properties with the Hausdorff 
dimension. 

Proposition 2.4. — The following properties hold: 

— (monotonicity) if Ad B d X, then crit,^ ^ ^ ciit^B, 

— (countable union) for any countable family of sets Ai C X, 

crit,j?(UAj) = supcrit,^^j, 

i 

— (Lipschitz monotonicity) If there is a sub-Lipschitz map from X to an- 
other metric space Y, then 

crit,!^ X ^ crit,!^ Y. 

— (Lipschitz invariance) if there is a bi-Lipschitz map from X onto another 
metric space Y , then 

crit,j? X = crit,j? Y. 

Proof. — The monotonicity and countable union properties are straigthfor- 
ward since Af^ is a measure for all s. The Lipschitz monotonicity and Lip- 
schitz invariance are proved just like the invariance of Hausdorff dimension, 
using the separation property. 

More precisely, let 5 : X — )• y be a sub-Lipschitz map: for some D > and 
all X, x' E X, 

d{g{x),g{x'))^Dd{x,x') 
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Given any countable closed D6-covermg (Fi) of y, the sets Ei = g~^{Fi) are 
closed, of diameter at most diamFj ^ 5 and cover X. By the separation 
property, given any s < t in the parameter set of ^ , there is a 6q such that 
for all r G (0,(5o) we have ft{D~^r) ^ fs{f)- If h.f^iY) = 0, then we can 
find coverings {Fi) of Y of arbitrarily low diameter making ^/^(diamFj) 
arbitrarily low. It follows that the corresponding coverings {Ei) of X make 
^/t(diam£^j) arbitrarily low, so that Kf^{X) = 0. Letting s and t approach 
the critical parameter of Y shows that 

crit ,yY ^ crit,j^ X 

If there is a bi-Lipschitz equivalence between X and Y , we get the other 
inequality by symmetry. □ 



3. Estimations tools 



Let us give two tools to estimate the critical parameter of a given set. 
Both are direct analogues of standard tools used for Hausdorff dimension. We 
consider here a fixed Polish metric space X and a given scale ^ = {fs)s&i- 

3.1. Upper bounds via growth of coverings. — The most evident way 
to measure the size of a compact set A is to consider the growth of the minimal 
number N[A, e) of radius e balls needed to cover A when e — t- 0. If N[A, e) is 
roughly more precisely if 

logiV(Ae) 



lim 



d 



log(l/e) 

then one says that X has Minkowski dimension (or M-dimension for short, also 
called box dimension) equal to d. The limit need not exist, and one defines the 
upper and lower M-dimensions by replacing it by an infimum or supremum 
limit. Equivalently, one can define these dimensions by 



M-dim(A) = inf <^ s > limsup iV(A, £)£" < +oo 

M-dim(^) = inf (s > liminf iVfyl, < +oo| 
which is much more easily generalized to arbitrary scales. 

Definition 3.1. — The lower and upper Minkowski critical parameter of a 
compact set A C X with respect to the scale ^ are defined as 



M-critj?(^) := inf <^ s G / limsupiV(A,e)/s(e) < +oo 
M-crit^(^) := inf |s G / liminf iV(A, e)/5(e) < +oo| 
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It is clear from the definition that M-crit ^ (A) ^ M-crit ^ (A) , and there are 
several other equivalent ways to define the Minkowski critical parameters, for 
example 



The following result enables one to get upper bounds on the critical param- 
eter. 

Proposition 3.2. — The following inequality always holds: 



Proof. — For all positive e, there is a covering [Bi) of A by N{A,£) balls of 
radius e. Given any t > s > M-crit^(yl) we have 



as soon as £ is small enough. Passing to an infimum limit, we get Af^{A) = 
and thus crit^ A ^t. □ 

Unfortunately, there is no way to have a lower bound of the critical param- 
eter in terms of these Minkowski versions. The classical counter-example is 
the set {0, 1, 1/2, 1/3, . . . } that has Minkowski dimension 1/2 but is count- 
able, thus has Hausdorff dimension 0. This is one of the reasons to introduce 
Hausdorff dimension and more general critical parameter: Minkowski critical 
parameters can grow significantly under countable union. 

They however share the other properties of critical parameters. 

Proposition 3.3. — The upper and lower Minkowski critical parameter sat- 
isfy the monotonicity and Lipschitz invariance properties: 

- if Ac B C X, then 

M-crit ^ (A) ^ M-crit^ (S) and M-crit ^( A) ^ M-crit 

— if there is a bi-Lipschitz equivalence A ^ B, then 

M-crit (A) = M-crit i^(^) and M-crit ^( A) = M-crit (5). 

We do not give the easy proof of this result, but note that for the bi-Lipschitz 
invariance, again one needs the full power of the separation property for scales. 

In order to compute M-crit and M-crit, one can also use packings: denoting 
by P{A, e) the maximal number of points in A that are pairwise at distance 
at least e, we indeed have N{A,2£) ^ P{A,e) and 2e) ^ N{A,e). Here, 
once again, the strong separation property is vital to ensure that the factor 2 
is harmless. 



M-crit^(A) = sup<^ s G / limMN{A,e)fs{e) > 



crit,j?(A) ^ M-crit^ (A). 



Y^ftidiamBi) ^ iV(A, e)/t(2e) ^ N{A,e)fsie) 
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3.2. Lower bounds via Prostman's lemma. — Finding a large packing 
of balls in A is not sufficient to bound the critical parameter from below but, 
as for the Hausdorff dimension, a close analogue that is sufficient is to exhibit 
a measure with small growth. 

Proposition 3.4 (Frostman's Lemma). — For all Borel subset A of X, 
if there is a Borel probability measure ^ concentrated on A and a positive 
constant C such that for all x £ A and all r > 

l^iB{x,r)) ^Cfsir) 

then Af^{A) > (and in particular crit,^(^) ^ s). Moreover the converse 
holds. 

The proof can be found for example in |Mat95j . The difficult part is the 
converse, while the very useful direct part is straightforward. 



4. Critical parameters of Hilbert cubes 

Let us now use the previous tools to compute critical parameters for the 
Hilbert cubes defined in the introduction. Here X is assumed to be compact. 

The topology of a Hilbert cube HC(X; a) is the product topology, in partic- 
ular it is compact. It need not be infinite dimensional in general; for example 
if X is finite and a is geometric, then HC(X, a) is a finite-dimensional, self- 
similar Cantor set. 

We shall estimate critical parameters for two different kind of coefficients 
a; in both cases the upper bound is obtained with the same method, so let us 
give a technical lemma to avoid repetition. 

Lemma — Let {X,d) be a compact metric space of finite, positive upper 
Minkowski dimension s and let a = (a„)„^i be an £^ sequence of positive 
numbers. If L : (0, 1) — > N* is a non-increasing function such that 

" (diamX)2 

n>L{e) ^ ' 

then for all r] > and all e small enough compared to i], we have 

(1) logiV(HC(X,a),e) ^ (s + 7?) ( log J] a„ + - log + log - 1 . 

Proof. — Let s' = s + rj. By definition of upper M-dimension, there is a 
constant C such that for all e < 1, N(X,£)e'^ ^ C. We shall construct a 
covering of the Hilbert cube from coverings at different scales of X. Denoting 
by Xa„ the space X endowed with the metric and, we have N{Xa„,e) = 
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N{X,£/an), thus for all £ smaller than maxa„, and each n, we can find a 
family of C {2C an \/n log n/eY points such that every x G Xa^ is at 

distance at most e/{2Cy/nlogn) from one of them. The use of the sequence 
{y/nlog n) will become clear in a moment; what is important is that it increases 
not too fast, but its inverse is i"^. 

Now any point {xi,X2, ■ ■ ■) in HC(X, a) is at distance at most e/2 from 
[xi, . . . , X2,(e/2)i 0, 0, . . . ), which is itself at distance at most 

/ 1 \ 1/2 



2C V'^^ n log^ n J ^2 

enlarging C if nee 

We get 



(up to enlarging C if needed) from one of the points (x*/ , • • • , x 1(^^12) 1 0) • • • 



e 



A^(HC(X,a),e) 

n=l 

and we only have left to take the logarithm; two terms can be removed up to 
doubling rj: one proportional to L{e/2), absorbed by the L{e/2) log 1/e term, 
and one proportional to ^^^"^/^^ log log n, absorbed by the log(L(e/2)!) term. 
Note that this last comparison is of course very inefficient, but it avoids adding 
a L{e/2) loglogL(e/2) term to the formula and the log(L(e/2)!) term must be 
present anyway due to the presence of y/n in the product above. □ 

When a decays exponentially, the Hilbert cube has intermediate size and 
its fine critical parameter can be determined. 

Proposition 4 ■2. — Let X he any compact metric space and let A G (0, 1). 

We have 

^ crit^, HC(X, (A")) ^ M-crit^, HC(X, (A'^)) ^ ^"f^"",^ 
21ogi^ 2\og^ 

In particular, if X has positive and finite Hausdorff and upper Minkowski 
dimension, then 

crit^HC(X, (A")) = 2 

In particular, when < M-dimX = dimX < +00 the 2-fine intermediate 
critical parameter of the Hilbert cube is equal to dim X/ (2 log 1/A). 

Proof. — We denote by H the generalized Hilbert cube under study. Note that 
both inequalities are trivial when dimX = and, respectively, M-dimX = 
+00. We therefore assume otherwise. 

Using the notation Lemma l4. 11 one can choose L such that 

logi 
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where f ^ g means asymptotic equivalence: f = g + o{g). Then in ([T]) the 
second term is neghgible (of the order of log 1/e log log compared to the 
first and third ones, and for all s > M-dimX we get when e is small enough 
(up to invoking the lemma for a slightly smaller s): 

V 21ogi log^ J 

so that M-crit.^ H 2 and M-crit j^^ H M-dimX/(21og 1/A). 

For all < t < dimX, there is a Borel probability measure v on X such 
that u{B{x^r)) ^ Cr* for all r. Such a measure exists by Frostman's lemma 
since the t-dimensional Hausdorff measure of X is infinite, hence positive. Now 
A* := (^'^^ii' is a Borel probability measure on HC(X, (A")) ~ X^, and for all 
r > 0, all functions M : R"*" N and all x we have 

M{r) 

B{x, r) C JJ Bxn{xm r) X X X X X . . . 

n=l 

where Bx^^{xn,r) is the ball in the scaled space Xxn^ and is therefore equal as 
a set to B{xn,rX~'^). This ball has i^-mesure at most C{r\~^y so that we get 

log/x(B(x,r)) ^ t ( -M(r) log - + log - J +0(M(r)) 

The optimal choice is then to take 

log I 

so that ^ 

log/x(B(x,r)) ^ -— ^(log-) +O(log- 

21og^ V ^/ V 

Using Frostman's lemma and letting t go to dim X we get 

crityF ^ dimX/(21ogl/A) 
and in particular crit.^ H ^ 2. □ 

When a decays polynomially, the corresponding Hilbert cube over any space 
of positive and finite dimension has power-exponential size, mostly indepen- 
dant of the geometry of X. Note that we shall need more precision than before 
when using Frostman's lemma. 

Proposition 4-3- — Let X be any compact metric space and let a > 1/2. If 

X has positive Hausdorff dimension, then 

-A_^^crit^HC(X, (n-")) 
Za — 1 
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and if X has finite upper Minkowski dimension then 
M-crit,^HC(X, (n-°)) ^ ^ 



2a - 1 

In particular, when X has positive and finite HausdorfF and upper Minkow- 
ski dimensions, the power-exponential critical parameter of HC(X, {n~")) is 
equal to 2/(2a - 1). 

Proof. — Using the notation of Lemma l4.H L can be chosen such that there 
are constants C < D satisfying 



/ 1 \ 2^-1 /IN 2«-l 

c(-) . 

For all s greater than the upper M-dimension of X and all small enough e we 
have (recalling that according to Stirling's formula, logm! = m log m + 0(m)) 

2 

log N{H, e) ^ s{D - C) 0^ log ^ 

For all t > 2/(2a — 1), the quantity A^(i7, e) exp(—(l/e)*) is therefore bounded. 
It follows that 

M-crit<32HC(X, ^ — - — . 

2a — 1 

To get the lower bound, we start by assuming dimX > 1 (otherwise, take 
p > 1/ dim X so that dim X^ > 1 and observe that there is a bi-Lipschitz 
embedding from RC{XP, (n"")) to HC(X, (n""))). 

From Frostman's lemma there is a non-zero Borel probability measure 
on X such that v{B{x,r)) ^ Cr for all r. As before we define ^ := (8)^:^1^ 
which is a Borel probability measure on H = HC(X, (n~")) ~ X^ . We want 
to precisely estimate the ^-measure of small balls in H. Fix a point x £ H. 
For convenience, we introduce the notation a = {n~°')n, af^ = {n~'^)n'^k and 
we define similarly . Let also S'a„(x,r) be the sphere of center x and radius 
r in Xa„. We can write 



B{x,r)= U SaA^uri)x B{x^, 



rl) 



where the right factor is a ball of HC(X, a ). Denoting by a the push- forward 
of the measure u by the map x i— )• (ia^(xi,x), we have 



u{B{xi,r)) = / (T{dri) 

and by Fubini's theorem 



fi{B{x,r)) = I fi[B{x'^,Jr'^-rf)\a{dri 
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We know that a{dri) ^ Cr/ai for all r > 0, thus there exists a coupling 
measure 11 on M"*" x M"*" supported on {{u,v)\u ^ v} such that its first marginal 
is equal to a and its second marginal is lesser than or equal to {C /ai)dv (with 
dv the Lebesgue measure) . One indeed can take for 11 the increasing rearrange- 
ment between these two measures (see e.g. |Vil09j page 7 for a definition). 
Using that the left factor in the following integrand is non-increasing, we get 



^{B{x,r)) 



j n (^B{x^, - rf)^ U{dridv) 

[ [ fi fs(x^ \/r2 - v^)) U{dridv) 
Jo Jr+ ^ ' 

f'T 

J n (^B{x^, - v^)^ {C/ai)dv 



Assume, given an integer M, that there is a constant Cf^ such that ji{B{x, r)) ^ 
C^^r*^ for all r. Then, using a change of variable v = rcos9, the above in- 
equality yields 

fi{B{x,r)) ^Cfj-l r sin^+^edo] r^+i 



We know that the Wallis integral is asymptotically equivalent to -^/tt /2{M + 1), 
so that there is a positive constant D depending on C such that 

^ M 



^i(B(x,r)) ^ —= — jrj — r 



Defining an integer valued function M such that M(r) ~ r we get 
log /i(S(x, r)) ^ (^/3(a " ^) " l) M{r) log ^ + 0(M(r)) 
so that whenever /? < 2/ (2a — 1), we have 

fiV 1 

logfi{B{x,r)) ^-E{-] log- 
\r J r 

for some positive constant E, and we deduce from Frostman's lemma that 
crit^iJ ^ 2/(2a - 1). □ 

Corollary 11.11 from the introduction is a direct consequence of the above 
result. 
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Proof of Corollary — Assume there is a bi-Lipschitz embedding from the 
Hilbert cube HC(X, to HC(y, {n~^)) where X has positive Hausdorff 

dimension and Y has finite upper Minkowski dimension. Then by Proposition 
3] and the monotonicity property, we have 

^ ^ crit^ RC{X, (n"")) ^ M-crit,,^^ RC{Y, (n^^)) ^ ^ 



2a -1 ' " " ' " " 2/3-1 

which imphes /3 ^ a. □ 



PART II 

LARGENESS OF WASSERSTEIN SPACES 



5. Wasserstein spaces 

For a detailed introduction on optimal transport, the interested reader can 
for example consult |Vil03j . or [SanlOj for a more concise overview. Optimal 
transport is about moving a given amount of material from one distribution 
to another with the least total cost, where the cost to move a unit of mass 
between two points is given by a cost function. Here the cost function is related 
to a metric, and optimal transport gives a metric on a space of measures. Let 
us give a few precise definitions and the properties we shall need. 

Given an exponent p £ [l,oo), if {X,d) is a general metric space, always 
assumed to be Polish (complete separable), and endowed with its Borel a- 
algebra, its Wasserstein space is the set Wp{X) of (Borel) probability mea- 
sures n on X whose p-th moment is finite: 

J d{xQ,xY fi{dx) < oo for some, hence all xq £ X 

endowed with the following metric: given £ Wp{X) one sets 

WpilJi.v) = ( inf / d{x,yyYi{dxdy) 

V n JXxX 

where the infimum is over all probability measures 11 on X x X that project 
to /i on the first factor and to v on the second one. Such a measure is called 
a transport plan between ^ and i^, and is said to be optimal when it achieves 
the infimum. The function # is called the cost function, and the value of 
Jj^^-^ d{x,yy ^.{dxdy) is the total cost of H. 

In this setting, an optimal transport plan always exists. Note that when X 
is compact, the set Wp{X) is equal to the set J^{X) of all probability measures 
on X and Wp metrizes the weak topology. 



i/p 
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The name "transport plan" is suggestive: it is a way to describe what 
amount of mass is transported from one region to another. 

One very useful tool to study optimal transport is cyclical mononotonicity. 
Given a cost function c (= dP here) on X x X, one says that a set S C X x X 
is (c-) cyclically monotone if for all families of pairs (xo,2/o)) ■ ■ ■ > {xk,yk) G S, 
one has 

c(xo,yo) H ^c{xk,yk) ^ c(xo,yi) H hc(xfc_i,yfc) +c{xk,yQ) 

in words, one cannot reduce the total cost to move a unit amount of mass 
from the Xj to the yi by permuting the target points. A transport plan 11 is 
said to be cyclically monotone if its support is. Using continuity of the cost 
we use here, it is easy to see that an optimal transport plan must be cyclically 
monotone. It is a non-trivial result that the reciprocal is also true, see |Vil03j . 



6. Embedding powers 

This section is logically independent of the rest of the article. We prove 
Theorem 11.21 and consider its optimality and its dynamical consequence. 

6.1. Proof of Theorem 11.21 — The first power of X embeds isometrically 
hy X ^ 5x where 5x is the Dirac mass at a point. To construct an embedding / 
of a higher power of X into its Wasserstein space, the idea is to encode a tuple 
by a measure supported on its elements, without adding any extra symmetry: 
one should be able to distinguish /(a, 6, . . .) from a, . . .). Define the map 

/: X^ ^ Wp{X) 

k ^ 

x = {xi,...,Xk) H> a^^Sxi 

i=l 

where a = 1/(1 — 2"'"') is a normalizing constant. This choice of masses 
moreover ensures that different subsets of the tuple have different masses. 
This map obviously has the intertwining property since '^^{5x) = ^(^(x)- 

Lemma 6.1. — The map f is {a/2)p -Lipschitz when X is endowed with 
the metric dp. 

Proof. — There is an obvious transport plan from an image f{x) to another 
f{y), given by al].j2~*(5x. (g) 6y^. Its L'f cost is 

a^2-'d{x,,yir ^ a/2j2d{xi,yir 

i i 

SO that Wp{f{x)J{y)) < {a/2)^v dp{x,y). □ 
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Our goal is now to bound Wp{f{x), f{y)) from below. The very formulation 
of the Wasserstein metric makes it more difficult to give lower bounds than 
upper bounds. One classic way around this issue is to use a dual formulation 
(Kantorovich duality) that expresses the minimal cost in terms of a supre- 
mum. Here we give a more direct, combinatorial approach based on cyclical 
monotonicity. 

The cost of all transport plans below are computed with respect to the cost 
dP ^ where p is fixed. 

6.1.1. Labelled graphs. — To describe transport plans, we shall use labelled 
graphs, defined as tuples G = {V, E,m,mo,mi) where F is a finite subset of 
X, E is a set of pairs {x, y) G V"^ where x ^ y (so that G is an oriented graph 
without loops), m is a function E — >• [0, 1] and mg, mi are functions V — )■ [0, 1]. 
An element of V will usually be denoted by x if it is thought of as a starting 
point, y if it is thought of as a final point, and v if no such assumption is 
made. 

To any transport plan between finitely supported measures, one can asso- 
ciate a labelled graph as follows. 

Definition 6.2. — Let /i, v be probability measures supported on finite sets 
A,BcX and let IT be any transport plan from fx to i^. We define a labelled 
graph by: = AuB, 

E^ = suppn\ A = {(x,y) G I X / y and U{{x,y}) > O}, 

m^{x,y) = U{{x,y}), 7n^{x) = fi{{x}) and ?n^(y) = iy{{y}). 

In other words, the graph encodes the initial and final measures and the 
amount of mass moved from any given point in supp to any given point in 
suppi'. The transport plan itself can be retrieved from its graph; for example 
its cost is 

Cp(n) = Y,m^ie)d{e~,e+)P 

eeE 

where e~ and e+ are the starting and ending points of the edge e. 

Not every labelled graph encodes a transport plan between two measures. 
We say that G is admissible if: 

- for ah e e E, m{e) > 0, 

- for all v^V, mo(t;) + ^g^(^„)g^m(e) - Y.e={v,y)&E^{(^) = "^i(^) (this 
is mass invariance), Yle={x,v)(iE'^i^) ^ mi{v) and Yje={v,y)&E'''^i^) ^ 
mo(v). 

A labelled graph is admissible if and only if it is the graph of some transport 
plan. The next steps of the proof shall give some information on the graphs 
of optimal plans. 
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6.1.2. The graph of some optimal plan is a forest. — Let us introduce some 
notation related to a given labelled graph G. A path is a tuple of edges 
P = (ei, . . . ,ei) such that Cj has an endpoint in common with Cj+i for all i. 
If moreover ef = e~j^i holds for all i, we say that P is an oriented path. We 
define the unitary cost of P as the cost of a unit mass travelling along P, that 
is c{P) = Yl\=i ^fy^ fl^''^ of ^ ^ the amount of mass travelling 

along P, that is (j){P) = minjm(ej). Cycles and oriented cycles are defined in 
an obvious, similar way; a graph is a forest if it contains no cycle. 

Lemma 6.3. — H is an optimal plan between any two finitely supported 
measures fJ,,!^, then contains no oriented cycle. 

Proof. — This is a direct consequence of the cyclic monotonicity of optimal 
plans: if there were points vi,V2, ■ ■ ■ ,Vn in such that Vn = fi and m{i) := 
m^(i'i, fj+i) > for all i < n, then by subtracting the minimal value of 
rui from each of them one would get an new admissible labelled graph with 
mo = mJl and mi = and cost less than the cost of G^. This new graph 
would give a new transport plan from /i to v, cheaper than H. □ 

An optimal plan can a priori have non-oriented cycles, but up to changing 
the plan (without changing its cost), we can assume it does not. 

Lemma 6.4. — Between any two finitely supported measures fi, u, there is 
an optimal plan 11 such that G^ is a forest. 

Proof. — Let 11 be any optimal plan from /i to v, and let Gq = be its 
graph. 

A non-oriented cycle is determined by two sets of vertices xi,...,Xn and 
yi,. . . ,yn and two sets of oriented paths Pi : Xi ^ yi, Qi : Xi ^ y^+i where 
yn+i := yi, see Figured! 

Consider a minimal non-oriented cycle of Gq, so that no two paths among 
all Pj's and Qj's share an edge. 

One can construct a new admissible labelled graph Gi , with the same vertex 
labels mo and mi as G, by adding a small e to all m(e) where e appears in 
some Pi, and subtracting the same e from all m(e) where e appears in some Qi. 
This operation adds e to 0(-Pt) and —e to 4>{Qi), thus it adds e c[Pi) — c{Qi) 
to the cost of n. 

Since 11 is optimal, one cannot reduce its cost by this operation. This 
implies that Yli c(-Pi) — c{Qi) = 0. By operating as above with e equal to plus 
or minus the minimal value of all m(e) where e appears in a Pi or in a Qj, one 
designs the wanted new admissible graph Gi. 

Now, Gi has its edge set included in the edge set of G, with at least one 
less oriented cycle. By repeating this operation, one constructs an admissible 
labelled graph G without cycle, that has the same total cost and the same 
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P2 ^ ~ - - ' ' ^2 

Figure 1. A non-oriented cycle: Xi's and y^'s arc the vertices where 
the edges change orientation. 



vertex labels as Gq. The transport plan defined by G is therefore optimal, 
from /i to z^. □ 

The non-existence of cycles has an important consequence. 

Lemma 6.5. — Let II be a transport plan between two finitely supported mea- 
sures fj, and V, whose graph is a forest. If there is some real number r such 
that all m^{v) and all m^{v) are integer multiples of r, then all m^{e) are 
integer multiples of r. 

Proof. — Let Go = = {V, E,m,mQ,mi). If Go has no edge, then we are 
done. Otherwise, Go has a leaf, that is a vertex xq connected to exactly one 
vertex i/q, by an edge eo- Assume for example that eo = {xq, yo) (the other case 
is treated similarly). Then m(eo) = mo(a;o) ~ nT-iixo) is an integer multiple of 
r. 

Define Gi = {V,E\ {eo}, m', mQ,m[) where: 

- m'{e) = m(e) for all e G £^ \ {eo}, 

- mo(xo) = mo{xo) + m(eo), 

- tuqIx) = nioix) for all x £ V \ {xq}, 

- rn[{yo) = mi(yo) - m(eo), 

- m[{y) = mi{y) for all y e V \ {yo}. 
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Then Gi is still admissible (with different starting and ending measures fi' and 
I'', though), and all ■mQ{v),mi{v) are integer multiples of r. By induction, we 
are reduced to the case of an edgeless graph. □ 

6.1.3. End of the proof. — Now we are ready to bound Wp(/(^), f{y)) from 
below in terms of doo{x,y). Let IT be an optimal transport plan from f(x) to 
f{y) whose graph G = {V, E,m,mo,mi) is a forest. 

Lemma 6.6. — For all index iq, there is a path in G connecting Xi^ to yi^ 

Proof. — The choice of / shows that all mo('v), mi{v) are integer multiples of 
a2~^, so that all m(e) are integer multiples of a2~''. Let n{e),no{v),ni{v) G N 
be such that m(e) = n{e)a2~^, mo{v) = nQ{v)a2~^ and mi{v) = ni{v)a2~'' . 
Then the only v & V = supp /(x) Usupp /(y) such that no{v) contains 2'^"*° in 
its base-2 expansion is Xi^. Similarly, the only w £ V such that ni{w) contains 
2^-10 base-2 expansion is yi^ . Let E' C E he the set of edges e such that 
n(e) contains 2^^"*° in its base-2 expansion. 

Any vertex v such that no(u) — ni{v) does not contain 2'^"*'' in its base-2 
expansion must be adjacent to an even number of edges of E' due to mass 
invariance. Therefore the non-oriented graph induced by E' has exactly two 
points of odd degree: Xi^ and y^p. It is well known and a consequence of a 
simple double-counting argument that a graph has an even number of odd 
de gree vertices, from which it follows that the -E-'-connected component of Xig 
must contain □ 

Prom now on, fix io index that maximizes d{xi,yi) and let Pq be a 
minimal path between x^^ and i/jQ. Each final point of each edge in this path 
has to be some j/j, all distinct by minimality, so that Pq has length at most k. It 
follows by a convexity argument that c(-Po) is at least k{d{xi^^ , yi^ ) /kY . Lemma 
I6.5l implies 4>{P) ^ a2~^ so that the cost of 11 is at least a2~^d{xig,yiQy /k^'^ . 
We get 

1 „i 

CeP2 P _ _ 1 
Wp(/(x),/(y)) ^ 1 doo{x,y) ^ -dp{x,y) 

k^ p k{2^-l)p 
which ends the proof of Theorem 11.21 

6.2. Discussion of the embedding constants. — One can wonder if the 
constants in Theorem 11.21 are optimal. We shall see in the simplest possible 
example that they are off by at most a polynomial factor, then see how they 
can be improved in a specific case. 

Proposition 6.7. — Let X = {0, 1} where the two elements are at distance 
1 and consider a map g : — )• /^(X) such that 

mdp{x,y) ^ Wp{g{x),g{y)) ^ Mdp{x,y) 
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for all X, y G and some positive constants m, M. Then 

1 , M f2^-l\^ 

m ^ ana — ^ — ; — 

(2fc _ i)p m \ k J 

Moreover there is a map whose constants satisfy m = (2 — 1) p and M/m ^ 
(2*^ - l)p. 



Proof. — By homogeneity, it is sufficient to consider p = 1, in which case 
is the /c-dimensional discrete hypercube endowed with the Hamming metric: 
two elements are at a distance equal to the number of bits by which they 
differ. Moreover Wi{X) identifies with the segment [0,1] endowed with the 
usual metric | • |: a number t corresponds to the measure t6o + (1 — t)6i. 

The diameter of X'^ is k, so that the diameter of g{X^) is at most Mk. 
Since g{X^) has 2^ elements, by the pigeon-hole principle at least two of them 
are at distance at most (2'^ — l)~^Mk. Since the distance between their inverse 
images is at least 1, we get m ^ {2^ — l)~^Mk so that M/m ^ {2^ — l)/k. The 
pigeon-hole principle also gives m ^ {2^ — 1)"^ simply by using that Wi{X) 
has diameter 1. 

To get a map g with M/m = {2^ — 1), it suffices to use a Gray code: it is an 
enumeration xi,X2, ■ ■ ■ , of the elements of X'', such that two consecutive 
elements are adjacent (see for example [HamSO] ). Letting /(xj) := {i — 
l)/(2'= - 1) we get a map with M ^ 1 and m = (2^ - 1)"^ □ 



Note that in Proposition 16.71 one could improve the lower bound on M/m 

by a factor asymptotically of the order of 2p by using the fact that every 
element in X^ has an antipode, that is an element at distance n from it. 
Let us give an example where the constants are much better. 



Example 6.8. — Let X = {0,1} with the following metric: given x = 
/ y = (y^,?/^,...) in X, d{x,y) = 2~* where i is the least in- 
dex such that X* 7^ y*. Then given k, let £ be the least integer such that 
2^ ^ k and let wi,...,Wk G {0,1}^ be distinct words on £ letters. For 
X = {x^jx"^,...) G X and w = {w^,...,w^) G {0,1}^, define wx as the el- 
ement {w^jw"^, . . . , w^,x^,x^, . . .) of X. 
Now let g:X''^Wp{X) be defined by 

k ^ 

g{x = (xi, . . . ,Xfc)) = ^ -^^w,xi- 

i=l 
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For all x,y £ X and all i / j, we have d{wiX,Wjy) ^ 2 ^ ^ d{wiX,Wiy). It 
follows that 



1 



Wp{g{x),gm = l^l^2-P'dP{x„y,)j 
= —^dp{x,y). 

For this example, we have M = m and moreover m has only the order of 
k p instead of being exponentially small. 

This example could be generalised to more general spaces, for example the 
middle-third Cantor set. What is important is that the various components 
of a given depth are separated by a distance at least the diameter of the 
components and that the metric does not decrease too much between d{x, y) 
and d{wx,wy) (any bound that is exponential in the length of w would do). 

6.3. Dynamical largeness. — In this section, X is assumed to be compact. 
Given a continuous map ip : X ^ X, for any n € N one defines a new metric 
on X by 

d[n]{x,y) ■■= max{d{ip\x),ip\y));0 ^ i ^ n}. 
Given e > 0, one says that a subset 5 of X is (n, e)-separated if d[n]{x,y) ^ e 
whenever x ^ y £ S. Denoting by P{i^, e, n) the maximal size of a (n, e)- 
separated set, the topological entropy of 93 is defined as 

htop(v') := lim lim sup ^' . 

Note that this limit exists since lim sup„_^_,_Qo ^ log e, n) is nonincreasing 
in e. The adjective "topological" is relevant since htop(9') does not depend 
upon the distance on X, but only on the topology it defines. The topological 
entropy is in some sense a global measure of the dependance on initial condition 
of the considered dynamical system. The map xd : x dx mod 1 acting on 
the circle is a classical example, whose topological entropy is log d. 

Topological entropy was first introduced by Adler, Konheim and McAndrew 
|AKM65] and the present definition was given independently by Dinaburg 
|Din70| and Bowen |Bow71j . 

Now, the metric mean dimension is 

mdim M{^,d) := liminf limsup — ^ — ] — ^. 

£->o „_^+oo n|loge| 

It is zero as soon as topological entropy is finite. Note that Lindenstrauss 
and Weiss define the metric mean dimension using covering sets rather than 
separated sets; but this does not matter since their sizes are comparable. 
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Let us now prove that when htopl^^) > 0, then Lp^ : Wp{X) — )• Wp{X) has 
positive metric mean dimension. 

Proof of Corollary \1.3[ — Let e, 77 > and k be such that rj ^ k{2 —l)pe. If 
^ is a (n, ?7)-separated set for {X, ip, d) then C X^ is a (n, r/) separated set 
for [X^ ,ipk^doo)- Then Theorem 11.21 shows that f{A^) is a (n, e)-separated 
set for {Wp{X),ip^,Wp), so that 



Let < htopCv?) and /3 < 1. For ah e > small enough, and for arbitrarily 
large integer n we have P{<p,e,n) ^ exp{nH). Define 



k 



/3p{- log e) 
log 2 



then k{2'' - l^/Pe = O ((- log e)e^-^) when e ^ 0. Therefore, for all 
small enough e, there are arbitrarily large n such that 

P{ip^,e,n) ^ exp{nHk) 

^ exp(^nif(^^(-loge)-l 
logP{ip#,e,n) ^ HPp H 



n(— loge) log 2 — loge 

H(3p 



mdimM(v5#, Wp) 



log 2 

Letting H — )• htoplv?) and /3 — 1 gives 

mdimM(v7#, Wp) ^ 

log 2 

as claimed. □ 

In the case of the shift on certain metrics on {0, 1}^, one could want to use 
the better bound obtained in Example 16.81 But the map g defined there does 
not intertwin ip^ and and the method above does not apply. 



7. Embedding Hilbert cubes 

In this last section we prove the two theorems about embeddings of Hilbert 
cubes in Wasserstein spaces and deduce consequences on their critical param- 
eters. 
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7.1. Embedding small Hilbert cube in the general case. — This sec- 
tion is devoted to the proof of Theorem 11.41 We use the same kind of map as 
in the proof of Theorem 11.21 but with coefficients that decrease faster to get 
better point separation. 

We assume here that X is compact. Let A, /3 € (0, 1) be real numbers to be 
more precisely chosen afterward and consider the following map: 

g: ^ W2{X) 

^ n=l 

where X^ will be identified with HC(X; (A")). We choose /3 < 1/2, so that g 
is one-to-one. It is readily seen to be a continuous map (when X^ is endowed 
with the product topology), and we have to bound from below W2{g{x), g{y)) 
for ah x,y G X^. 

First, since g{x) gives a mass at least 1 — /3 to xi, it gives a mass at most /3 
to X \ {xi}, and any transport plan from g{x) to g{y) moves a mass at least 
1 — 2/3 from xi to yi. We already have W2{9{x), g{y))'^ ^ (1 — 2/3)(i(xi, j/i)^. 

If all distances d{xn,yn) are of the same order as d{xi,yi), then this first 
bound is sufficient for our purpose. Otherwise, we shall reduce to an optimal 
transport problem involving partial measures. Define a new map g2 by g2ix) = 
SJS=2 (^""^x^] its images are measures of mass /3. Note that all the theory of 
optimal transport applies to non probability measures, as soon as the source 
and target measures have the same, finite total mass. Define a new cost 
function 

c{x, y) = min {d{x, yf, d{x, yif + d{xi,yf) 

Let n be any transport plan from g[x) to g{y). Then it can be written 
n = Hi + + + where: 

— Hi has mass between 1 — 2/3 and 1-/3 and is supported on {{xi^yi)}^ 

— n_^ is supported on {^2, X3, . . . } x {yi}, 

— is supported on {xi} x {y2,y3, • • • } and has same mass as n_j., 

— is supported on {x2, X3, . . . } x {^2, ys, ■ ■ ■ }• 

To see this, proceed as follows. First, letting h : (71,771) 1 — >■ {x^, y-m) there is a 
measure 11' on N x N such that /i#n' = H and the marginals of H' both are 
equal to P^^n- This is a direct application of classical methods, see for 

example the gluing lemma in |Vil03j . Then, let n'^, 11^, n'_^ and 11^ be the 
restrictions of n' to {(1, 1)}, {2, 3, . . . } x {2, 3, . . . }, {2, 3, . . . } x {1} and {1} x 
{2,3, . . . }. Then, setting 11,, := /i#n'^ produces the desired decomposition. 
Let 771 be the mass of (which equals the mass of H^) and define 

* = l(pi)#(n^) » (P2)#(n^) 
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where pi is the projection on the i-th factor. If we were to identify xi to 
yi, this would define a concatenation of and 11^ (note that the use of a 
product is sensible here, since the trajectories to be concatenated all would 
pass through xi ~ yi and their is no specific coupling between the x„'s and 
the y-m^s to remember). 

Define further 11 = + n_j. * 11^. It is in some sense the 52 part of 11, in 
particular it has mass /3. 

Let us prove that, denoting by c(n) the total cost of the transport plan 11 
under the cost function c = d?, we have 

c(n) ^c(ni) + c(n) 

The cost of n is the sum of the cost of its parts, and the second term of the 
right-hand side is to bound from below c(n_>. + + 11^). Consider a small 
amount of mass moved by this partial transport plan; it goes under 11 from 
some Xi to some yj ^ 2) either directly, or it is moved to yi and an 
equivalent amount of mass is moved from xi to yj. In the first case we use 
c ^ c, in the second case we use c(x, y) ^ d{x, yiY + d{xi,y)'^. 

As already stated, c(ni) ^ (1 — 2/3)d{xi,yi)'^ , and we have left to evaluate 
c(n). Given x,y, set du = d{xi,yi), a = d{x,yi) and b = d{xi,y). By 
the triangle inequality, a + b + du ^ d{x,y). Using + 6^ ^ ^(a + 6)^, 
it comes c ^ m.m{d'^ , ^d"^ — ddu + ^dfi). We shall bound —ddu by using 
{^/ed — dii/y/eY ^ for any positive e < 1 to be optimized later on. The 
inequality 

follows. We therefore get c(n) ^ ^d\^ + ^^cili) where 

is positive if e is large enough (precisely e > /3/(2 — 3/3)). Since 11 is a transport 
plan from g2{x) to 92(2/) where 52 is merely g composed with the left shift, an 
induction shows that 

c(n) ^ Ad{xi,yif + ABd{x2,y2f + AB^d{x3,y3f + ... 

where 



As a consequence, g is sub-Lipschitz (with constant \J AjB') from IIC(X; (A")) 
to #2(^) where A = -v/S- The condition on e implies that 
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Id 



••• □□ 

□ □□□□□□ 



□ □ □ □ ... 



Figure 2. After dividing the side- length sequence into blocks, we 
apply the induction hypothesis to each block to get adequate families 
of boxes in each slide Si . 



and any such B with /3 < 1/2 can be obtained. The optimal value for /? is 1/3, 
which gives an upper bound of 1/9 on i?. We can therefore get any A < 1/3. 



7.2. Embedding large Hilbert cube in the rectifiable case. — This 
section is devoted to the proof of Theorem II. 5[ The idea is to use the self- 
similarity at all scales of the unit cube I'^ to embedd isometrically HC(/'^,a) 
inside W2{I'^) for some sequences a. The claimed result will follow immediately, 
since a direct computation shows that a bi-Lipschitz embedding A ^ B gives 
a bi-Lipschitz embedding W2{A) — )• y^(-B) by push forward. 

The first step is to find appropriately scaled copy of I'^ in itself; the following 
is an elementary geometrical fact. 

Lemma 7.1. — Let c = {cn)n be an i"^ sequence of positive numbers. Then 
there exist a constant K depending only on d and ^ and a family of homo- 
theties h^ '. I'^ ^ I'^ with disjoint images and ratio Kcn. 

Proof. — Of course, the existence of such homotheties is equivalent to the 
existence of disjoint cubes (oriented according to the coordinate axes) of side- 
length Kcn in the unit cube I'^. Note that the condition c S ^'^ is also necessary 
by volume considerations. Figure [2] illustrates the idea of the proof. 

Since the result is independent of the order of the terms of c, and since lim c 
must be zero, we can assume that c is non-increasing. Up to a dilation we can 
moreover assume that by a factor \\c\\d ^ 1, in particular c„ ^ 1 for all n. 
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Define recursively no = and 



nj+i = max < n 



k=ni+l 



It is possible that rii = +00 for some i; let us momentarily assume it is not. 
We then have Ylk=ni+i ^n~^ ^ li ^^'^ 



00 



1 > E4 = E E 4 

k=l i=0 k=ni+l 

00 ^i+l 00 

^ E '^"'+1 E '^fc ^ ^ E '^"'+1 

1=0 fc=ni+l i=0 

In other words, we have divided the terms of c into groups of uniformly 
bounded norm, in such a way that the sequence of first terms of the 

groups is i^. Of course, if is infinite for some i, then we have the same 
conclusion with one group that is infinite. 

Consider inside I'^ non-overlapping slices of the form Si = I'^^^ x [a.j,6.j] 
such that \bi — ai\ > c„. . By induction on d, for all i we can find sub- 
cubes of /"^"i of side length equal (up to a constant depending only on d) 
to 

c^^-l-i, Cnj+2) • • • ) c.«.^-^ . We can therefore find ci-dimensional cubes in Si of 
the same sidelengthes, and we are done. □ 

Given an positive sequence c, and up to taking a smaller factor K than 
given in the Lemma, we can find homotheties of ratios Kcn such that the 
cubes Cn = hn{I'^) are not only disjoint, but satisfy the following separation 
property: for all x,y £ Cn and all z £ Cm with m ^ n, d{x, y) < d{x, z). 

Let b = (bn) be any positive £^ sequence of sum 1, let = bl/'^Cn and 
consider the map 

/i:HC(I'^;a) ^ W2{I'^) 

00 

n=l 

The separation property on the cubes C„ ensures that the optimal transport 
plan from h{x) to h{y) must be the obvious one, namely 11 = ^n^hn(xn) ® 

^h„(y„)- It has cost J2nbnK^cld^{Xn,yn) = K'^d{x,yY. 

The question is now which sequences o can be decomposed into a product 
of an and an l'^ sequence. If a G ^2d/((i+2)^ ^ ._ ^2d/(d+2) 

and c = a^/'-'^^^^ so that a„ = bl/'^Cn holds and the sequences have the right 
summability properties to apply what precedes. We have proved the following. 
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Theorem 7.2. — If a is any positive ^2d/{rf+2) ggg^gyjgg^ there is a map 

h : HC(/'^;a) ^ W2{I'^) 

that is a homothetic embedding in the sense that d{h{x),h{y)) = Kd{x,y) for 
some constant K . 

Another way to put it is that there is a constant K and an isometric em- 
bedding HC(/'^;Ka) ^ W2{I'^). 

Note that Holder's inequahty shows that one cannot apply our strategy to 
sequences not in as we shall see below, the upper bound 

of Theorem 11.71 shows that the exponent 2d/{d + 2) cannot be improved in 
general, even for a mere bi-Lipschitz embedding. 

7.3. Largeness of Wasserstein spaces. — Let us conclude with the proofs 
of largeness results claimed in the introduction. 

Proof of Proposition — Let X be a compact metric space of positive 
Hausdorff dimension and A € (0,1/3). By the embedding theorem ll.4| we 
have a continuous sub-Lipschitz embedding HC(X; (A")) ^ W2{X). Proposi- 
tion 14.21 tells us that 

crit^,HC(X;(A"))^^^ 
2 log I 

and by Lipschitz monotonicity the same holds for W2{X). Letting A go to 1/3 
finishes the proof. □ 

Last the lower and upper bounds in Theorem 1 1 . 71 can be individually stated 
under more general hypotheses. 

Proposition 7.3. — If X contains a bi-Lipschitz image of a Euclidean cube 
I'^, then W2{X) has at least power- exponential size, and more precisely 

crit<^#2(X) > d. 

Proof. — According to Theorem 11.51 there is a bi-Lipschitz embedding from 
HC(/"'; (n-°)) to #2(X) for ah a > {d + 2)/{2d). Proposition Ol tells that 
HC(/'^; (n^")) has power-exponential critical parameter bounded below by 
2/(2a — 1), which goes to d when a approaches {d + 2)/ {2d). Monotonicity 
gives the lower bound for W2{X). □ 

Let us use a counting argument to prove the following. 

Proposition 7.4- — If X is a compact metric space of finite upper Minkow- 
ski dimension d, then W2{X) has at most power- exponential size, and more 
precisely 

crit<^#2(X) =^ d. 
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Proof. — Fix some d' > d; by assumption, for all small enough e it is possible 
to cover X hy D = (1/e)'^ balls (Bi) of diameter e. By taking intersections 
with complements, we can instead assume that i?j's are disjoint Borel sets of 
diameters at most e. Consider the map 

m : W2{X) 

and endow with the metric. The map m is not continuous, but whenever 
E C has diameter at most a, we have 

diamm~"^(£') ^ {dia,mX)^/a + e. 

Indeed, given two measures fi, v such that — m(i/)||i ^ a, we can first 

move an amount of mass a oi ^ (by a distance at most diam X) to get a measure 
/i' that has the same images as v under m, then consider any transport plan 
from ^' to f that is supported on UBi x Bi (that is, move mass only inside each 
Bi). This last transport plan has cost at most and the triangular inequality 
provides the claimed bound. 

Now, for all D' > D and assuming e is small enough, it is possible to 
cover by (1/e)^^ balls (Ej) of diameter at most e^. We get a covering 
{m~^{Ej))j of W2{X) by (1/e)^^' sets of diameters at most (diamX + 
Writing D' = D + r//2, it comes 



N{W2{X), (diamX + l)e) ^ e 




so that M-crit.c^ ^2{X) ^ d" for all d" > d' > d, and we are done. □ 

Now Theorem 11.71 follows: X being a manifold, it has upper Minkowski 
dimension d and contains a bi-Lipschitz image of I'^, so both bounds apply. 



8. Largeness of subsets sets 

In this section, we briefly explain how to deduce Theorem 11.81 using the 
same methods than above. 

Let us recall that, when X is a compact metric space, '^{X) denotes the 
set of all closed subsets of X, endowed with the Hausdorff metric. 

Generalizing Hilbert cubes, whenever a = {an)n is a sequence of positive 
reals such that lim„ a„ = 0, let us denote by BC(X, oo,a) the space X^ 
endowed with the metric 

da{x,y) = sup and{Xn,yn) 
n 

Such a space shall be called a Banach cube while of course, topologically it 
is a Hilbert cube. One can similarly define Banach cubes BC(X,p, a) for any 
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p € [1,00], but we do not need that level of generality. The methods we used 
to measure the size of Hilbert cubes are easily generalized to Banach cubes. 

Proposition 8.1. — Let Y he a compact metric space of positive Hausdorff 
dimension and finite upper Minkowski dimension. Then for all positive a, it 
holds 

crit,<5* 60(^,00,(71-")) = - 

a 

Proof. — Up to a dilation, we can assume that Y has unit diameter. Using 
Frostman's lemma, given any s ^ dimy, there is a measure on y and a 
constant C such that v{B{y,r) ^ Cr^ for all y G 1" and all r > 0. Denote 
by IX the product measure ®v on B := BC(y, 00, (n"")) ~ Y^ . Choose any 
/3 > a and let N be an integer- valued function such that N{r) ~ r~^l^ . Then 
for all y G i?, and r > we have 

N{r) 

C l[B{yn,n"r)xY'' 

n=l 

and a quick computation shows that there is a constant D such that 

log fi{B{y,r)) ^ -D ( 1 V log i 
\r / r 

so that crit^k) B ^ ^. Letting /3 — )• a, we have the desired lower bound. 

The upper bound is obtained as usual using the upper Minkowski critical 
parameter. There is an integer-valued function M such that M{e) ~ e"^/" 
and diamY^-Q, ^ e for all n ^ M(e). Writting -8 = 11 ^n-a covering each 
of the terms by Ce^'^ balls of diameter e, where C, d are constants depending 
on Y, we see that B can be covered by at most 

balls of diameter e, and the result follows. □ 

Now we can deduce the first part of Theorem 11.81 

Proposition 8.2. — If X contains a bi-Lipschitz image of a Euclidean cube 
I'^, then 'tf{X) has at least power- exponential size, and more precisely 

crit^ '^{X) ^ d. 
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Proof. — Using Lemma l7.ll for all d' > d there are homotheties {hn)n(^n of 
ratio Cn~'^ from I'^ to J*^, with hn{I'^) and hm{I'^) separated by a distance at 
least Cn"'^ for all n,m. Then the map 

BC(/^oo,(n-^')) ^ ^(/'^) 

{xi,X2,...) ^ {hn{Xn) \ n £ N} 

defines an homothetic embedding of ratio C. 

Proposition 18.11 and the monotonicity property gives the result. □ 

Finally, the following results ends the proof of Theorem II. 8[ It is proved 
just like its Wasserstein analogue. 

Proposition 8.3. — If X is a compact metric space of finite upper Minkow- 
ski dimension d, then ^{X) has at most power- exponential size, and more 
precisely 

chtgg'r^iX) ^ d. 

Proof. — Fix some d' > d; for all small enough e it is possible to cover X by 
D = (l/e)*^ disjoint Borel sets (Bi) of diameter at most e. The map 

m:'^{X) {0,1}^ 
A ^ {mi{A))i 

defined by mi{A) = 1 if and only if AD Bi 7^ has the property that every 
point in {0, 1}^ has an inverse image of diameter at most e. 

We get a covering of '^{X) by 2^ sets of diameters at most e, and it follows 

M-crit,^ IgiX) s: (i'. 

This being valid for all d! > we get the desired result. □ 
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